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Abstract 

(N 

^ This work introduces a generic framework, inspired by [2, 50, 82], called the 

bag-of-paths (BoP), that can be used for link and network data analysis. The 
P^H primary application of this framework, investigated in this paper, is the defini- 

^ — , tion of distance measures between nodes enjoying some nice properties. More 

{N) precisely, let us assume a weighted directed graph G where a cost is associ- 

ated to each arc. Within this context, consider a bag containing all the possible 
paths between pairs of nodes in G. Then, following [50], a probability distribu- 
tion on this countable set of paths through the graph is defined by minimizing 
^ the total expected cost between all pairs of nodes while fixing the total rela- 

+J five entropy spread in the graph. This results in a Boltzmann distribution on 

3 me set or paths such that long (high-cost) paths have a low probability of be- 

ing sampled from the bag, while short (low-cost) paths have a high probability 
of being sampled. Within this probabilistic framework, the BoP probabilities, 
t-H P(s = i, e = j), of drawing a path starting from node i (s = i) and ending 

in node j (e = j) can easily be computed in closed form by a simple matrix 
inversion. Various applications of this framework are currently investigated, 
^ e.g., the definition of distance measures between the nodes of G, betweenness 

\Q indexes, network criticality measures, edit distances, etc. As a first step, this 

paper describes the general BoP framework and introduces two families of dis- 
q tance measures between nodes. In addition to being a distance measure, one of 

(T) these two quantities has the interesting property of interpolating between the 

t-H shortest path and the commute cost distances. Experimental results on semi- 

supervised tasks show that these distance families are competitive with other 
• th state-of-the-art approaches. 

X 
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1. Introduction 

Network and link analysis is a highly studied field and subject of much re- 
cent work in various areas of science: applied mathematics, computer science, 
social science, physics, chemistry, pattern recognition, applied statistics, data 
mining & machine learning, to name a few [55, 23, 46, 79, 41, 17, 75]. Within 
this context, one key issue is the proper quantification of the similarity between 
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the nodes of a graph, capturing their structural relationship by taking both di- 
rect and indirect connections into account. 

This is exactly the main subject of this work, which is directly inspired by 
the models developed in [2, 63], in the context of stochastic path planning, and 
already exploited in [50, 82], in order to define a family of dissimilarity mea- 
sures between nodes. Indeed, this paper tackles this problem by first defining 
a bag-of-paths (BoP) framework capturing the global structure of the graph by 
using, as a building block, paths on the graph. Within this probabilistic frame- 
work, various quantities of interest can be derived in a principled way, such 
as (1) families of distances between nodes reflecting the structural proximity 
of the nodes, (2) betweenness measures quantifying to which extent a node is 
in between two sets of nodes [45], (3) extensions - based on paths instead of 
direct links - of the modularity criterion for community detection, (4) edit dis- 
tances quantifying the distance between strings (by computing the distance on 
a directed acyclic graph as in [26]) and (5) robustness measures capturing the 
criticality of the nodes or the network. 

The first proposition will be investigated in the present paper; the other 
applications are left for subsequent papers and work. 

More precisely, we assume a weighted directed graph or network G where 
a cost is associated to each arc. Within this context, we consider a bag con- 
taining all the possible (either absorbing or non-absorbing) paths (or walks) 
between pairs of nodes in G. In a first step, following [2, 50, 63, 82], a probabil- 
ity distribution on this countable set of paths through the graph can be defined 
by minimizing the total expected cost between all pairs of nodes while fixing 
the total relative entropy spread in the graph. This results in a Boltzmann dis- 
tribution, depending on a temperature parameter T, on the set of paths such 
that long (high-cost) paths have a low probability of being sampled from the 
bag, while short (low-cost) paths have a high probability of being picked. 

In this probabilistic framework, the BoP probabilities, P(s = i, e = j), of 
sampling a path starting in node i and ending in node j can easily be com- 
puted in closed form by a simple nxn matrix inversion where n is the number 
of nodes in the graph. These BoP probabilities play a crucial role in our frame- 
work since they capture the similarity between two nodes i and j - the BoP 
probability will be high when the two nodes are connected by many, short, 
paths. In summary, the BoP framework has several interesting properties: 

• It has a clear, intuitive, interpretation. 

• The temperature parameter allows to monitor randomness by controlling 
the balance between exploitation and exploration. 

• The introduction of independent costs results in a large degree of cus- 
tomization of the model, according to the problem requirements: some 
paths could be penalized because they visit undesirable nodes having 
adverse features. For example, one could want to avoid hub nodes by 
discouraging the passage through high-degree nodes (the cost could then 
be set to the degree of the node). Or we may want to favor some features 
like the age category of people in a social network. 

• Many useful quantities of interest can be defined according to the proba- 
bilistic framework: distance measures, betweenness measures, etc. 
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• The quantities of interest are very easy to compute. 

It, however, also suffers from a severe drawback: the different quantities 
are computed by solving a system of linear equations, requiring a matrix in- 
version. This results in a n 3 computational complexity. Even more importantly, 
the matrix of distances necessitates a n 2 storage, altough this can alleviated by 
using, e.g., incomplete matrix factorization techniques. This means that the dif- 
ferent quantities can only be computed reasonably on small to medium graphs 
(containing a few thousand nodes). 

After introducing the BoP framework, one important application of the 
framework is considered - the definition of families of distance measures be- 
tween graph nodes taking into account the structure of the graph. Two such 
distances satisfying the triangle inequality are introduced. The distance be- 
tween a particular node and all the other nodes can be computed efficiently by 
solving a system of n linear equations. On the other hand, the matrix contain- 
ing the distances between all pairs of nodes can be computed by inverting an 
n x n square matrix. 

Moreover, in addition to being a distance measure, one of these two func- 
tions has the interesting property of nicely generalizing the shortest path and 
the commute cost distances by computing an intermediate distance, depend- 
ing on a temperature parameter T. When T is close to zero, the distance re- 
duces to the standard shortest path distance (emphasizing exploitation) while 
for T -> oo, it reduces to the commute cost distance (focusing on exploration), 
related to the resistance distance [25, 40]. A local recurrence formula, extending 
Bellman-Ford's formula, for computing the distance from one node of interest 
is also derived. 

Finally, our experiments show that these distance families provide compet- 
itive results in semi-supervised learning. 

2. Some related work 

This work is related to similarity measures on graphs for which some back- 
ground is presented in the following subsection 2.1. The presented BoP frame- 
work also has applications in semi-supervised classification, on which our ex- 
perimental section will focus on in Section 5. A short survey related to this 
problem can be found in subsection 2.2. 

2.1. Similarity measures on a graph 

Similarity measures on a graph determine to what extent two nodes in a 
graph resemble each other, either based on the information contained in the 
node attributes or based on the graph structure. In this work, only measures 
based on the graph structure will be investigated. Structural similarity mea- 
sures can be categorized into two groups: local and global [47]. Local similar- 
ity measures between nodes consider the direct links from a node to the other 
nodes as features and use these features in various way to provide similari- 
ties. Examples are the cosine coefficient [22] and the standard correlation [79]. 
On the other hand, global similarity measures consider the whole graph struc- 
ture to compute similarities. Our short review of similarity measures is largely 
inspired by the surveys appearing in [24, 50, 81, 82]. 
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First, similarity measures can be based on random walk models on the 
graph, seen as a Markov chain. As an example, the commute time (CT) ker- 
nel has been introduced in [64, 25] and was inspired by the work of Klein & 
Randic [40] and Chandra et al. [10]. More precisely, Klein & Randic [40] sug- 
gested to use the effective resistance between two nodes as a meaningful dis- 
tance measure, called the resistance distance. Also, a close link between the 
effective resistance and the commute time of a random walker on the graph 
was highlighted in [10]. In short, the commute time kernel takes its name from 
the average commute time measure, defined as the average number of steps 
that a random walker, starting in a given node, will take before entering an- 
other node (average first-passage time [56]) for the first time and going back to 
the initial node. It was shown in [64, 25] that the elements of L+ are inner prod- 
ucts of the node vectors in the Euclidean space where these node vectors are 
exactly separated by the commute time distance. The relationships between the 
Laplacian matrix and the commute cost distance (the expected cost of reaching 
a destination node from a starting node and going back to the starting node) 
were studied in [25]. The authors showed that the elements of L + are inner 
products of the node vectors in the Euclidean space where these node vectors 
are exactly separated by the square root of the commute time distance, which 
is therein called the Euclidean commute time distance. Finally, an electrical 
interpretation of the elements of L + can be found in [81]. 

Sarkar et al. [65] suggested a fast method for computing truncated commute 
time neighbors. At the same time, several authors defined an embedding that 
preserves the commute time distance with applications in various fields such 
as clustering [84], collaborative filtering [25, 8], dimensionality reduction of 
manifolds [29] and image segmentation [60]. 

Instead of taking the pseudoinverse of the Laplacian matrix, a simple regu- 
larization leads to a kernel called the regularized commute time kernel [32, 14, 
15]. Ito et al. [32], further propose the modified regularized Laplacian kernel 
by introducing another parameter controlling the importance of nodes. This 
modified regularized Laplacian kernel is also closely related to a graph reg- 
ularization framework introduced by Zhou & Scholkopf in [88], extended to 
directed graphs in [87]. 

The exponential diffusion kernel, introduced by Kondor & Lafferty [43] and 
the von Neumann diffusion kernel, introduced in [66] are similar and based on 
the sum of a power series of the adjacency matrix. A meaningful alternative 
to the exponential diffusion kernel, called the Laplacian exponential diffusion 
kernel (see [43, 69]) is a diffusion model that substitutes the adjacency matrix 
with the Laplacian matrix. 

Random walk with restart kernels, inspired by the PageRank algorithm and 
adapted to provide relative similarities between nodes, appeared relatively re- 
cently in [59, 57, 77]. Nadler et al. [53, 54] and Latapy et al. [44, 58] suggested 
a distance measure between nodes of a graph based on a diffusion process, 
called the "diffusion distance". The Markov diffusion kernel has been derived 
from this distance measure in [24] and [83]. The natural embedding induced 
by the diffusion distance was called "diffusion map" by Nadler et al. [53, 54] 
and is related to correspondence analysis [83]. 

More recently, Mantrach et al. [51], inspired by [2] and subsequently by [63], 
introduced a link-based covariance measure between nodes of a weighted di- 
rected graph, called the sum-over-paths (SoP for short). They defined a proba- 
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bility distribution on the set of paths which results in a Boltzmann distribution 
such that high-cost paths occur with low probability while short paths occur 
with a high probability. Two nodes are then considered as highly similar if 
they often co-occur together on the same - preferably short - path. A related 
co-betweenness measure between nodes has been defined in [42]. Moreover, 
also inspired by [2, 63], a parametrized family of dissimilarity measures, called 
the randomized shortest path (RSP) dissimilarity, reducing to the shortest path 
distance at one end of the parameter range, and to the commute time distance 
at the other end was proposed in [82]. Subsequently, similar ideas appeared in 
[13], based on considering the co-occurences of nodes in forests of a graph, and 
in [3], based on a generalization of the effective resistance in electric circuits. 
These two last families are metrics while the RSP dissimilarity does not satisfy 
the triangle inequality. 

2.2. Graph based semi-supervised classification 

Semi-supervised graph node classification has received an increasing inter- 
est in recent years (see [1, 11, 30, 89, 90] for surveys) and several categories of 
approaches have been suggested. Among them, we may cite random walks 
[88, 71, 9], graph mincuts [6], spectral methods [12, 69, 43, 36], regularization 
frameworks [4, 78, 80, 86, 87], transductive and spectral SVMs [34], to name 
a few. Only the approaches investigated in this paper are reviewed; see the 
survey papers for more information. 

Still another family of approaches is based on kernel methods, which em- 
bed the nodes of the input graph into a Euclidean feature space where a deci- 
sion boundary can be estimated using standard kernel semi-supervised meth- 
ods, such as SVM. Fouss et al.[24] investigated nine graph kernels with appli- 
cations to collaborative recommendation and semi-supervised classification. 
Kernels must be semi-definite positive matrices and are usually obtained through 
similarity measures like the ones introduced in the subsection above. This ap- 
proach has been proved to be quite competitive, but a naive application of 
these graph-kernel approaches does not scale well since it relies on the compu- 
tation of a dense similarity matrix, which usually requires a matrix inversion. 
Zhou et al. [86, 87] suggested a way to avoid computing each pairwise mea- 
sure and solving a system of linear equations instead. Following that idea, 
Mantrach et al. [49] introduced three algorithms to address within-network 
semi-supervised classification tasks on large, sparse and directed graphs, with 
a linear computing time in the number of edges. 

Another category of methods relies on random walks performed on a weighted 
and possibly directed graph seen as a Markov chain. The random walk with 
restart [57, 76, 77], directly inspired by the well known PageRank algorithm, is 
one of them. The P-walks method [9] belongs to the same category. It defines, 
for each class, a group betweenness measure based on passage times during 
special random walks of bounded length. Those walks are constrained to start 
and end in nodes within the same class, defining distinct random walks for 
each class. Each node of the graph is traversed by these walks and the number 
of passage times on nodes for each type of random walk is computed, therefore 
defining a distinct betweenness for each class. 

The main advantage of those approaches is that class labels can be com- 
puted very efficiently (in linear time) while providing competitive results on a 
number of semi-supervised tasks. 
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3. The bag-of-paths framework 



3.1. Background and notation 

We now introduce the bag-of-paths framework providing both a related- 
ness index and a distance measure between nodes of a weighted directed graph. 
Consider a weighted directed graph or network, G, not necessarily strongly 
connected, with a set V of n nodes (or vertices) and a set of arcs £ (or edges). 
In the sequel, column vectors are written in bold lowercase while matrices are 
in bold uppercase. 

Roughly speaking, the BoP model will be based on the probability that a 
path picked from a "bag of paths" has nodes i and j as its starting and ending 
nodes, respectively. We assume a bag containing objects (these objects being 
words in information retrieval), with two properties: 

• First, the objects that are picked are paths of arbitrary length. 

• Second, each path will be weighted according to its quality, i.e. its total 
cost. The likelihood of picking a low-cost path will be higher than picking 
a high-cost path - low-cost paths being therefore favored. 

According to this model, the probability of picking a path starting in node 
i and ending in node j from the bag-of-paths can easily be computed in closed 
form. This probability distribution serves as a building block for several exten- 
sions, such as a distance measure between nodes, etc. 

More precisely, it is assumed, as usual, that we are given an adjacency ma- 
trix A with elements ay > quantifying in some way the affinity between 
node i and node j. From this adjacency matrix, a standard random walk on 
the graph is defined in the usual way: the transition probabilities associated to 
each node are simply proportional to the affinities (and normalized): 

The matrix P ref , containing the p^ f , is stochastic, and contains non-negative 
values. This matrix is usually called the transition matrix of the natural random 
walk on the graph. These transition probabilities will be used as reference 
probabilities later; hence the superscript "ref". 

Moreover, we assume that, in addition, an immediate cost of transition, cy, 
is associated to each link i -> j of the graph G. If there is no link between i 
and j, the cost is assumed to take a large value, denoted by cy w oo. The cost 
matrix C is the matrix containing the immediate costs Cjj as elements. A path 
p (also called a walk in the literature) is a sequence of jumps to adjacent nodes 
on G (including loops), initiated from a starting node s = i, and stopping in 
an ending node e = j. The total cost of a path p is simply the sum of the local 
costs along p. On the other hand, the length of a path is the number of steps, or 
jumps, needed for following that path. 

Costs are set independently of the adjacency matrix; they are supposed to 
quantify the cost of a transition, according to the problem at hand. Cost can, 
e.g., be set in function of some properties, or features, of the nodes or the arcs 
in order to bias the probability distribution of choosing a path. In the case of a 
social network, we may, for instance, want to bias the paths in favor of young 
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persons. In that case, the cost of jumping to a node could be set proportional 
to the age of the corresponding person. Therefore, walks visiting a large pro- 
portion of older persons would be penalized versus walks visiting younger 
persons. Another example aims to favor hub-avoiding paths penalizing paths 
visiting hubs. In that case, the cost can be set to the degree of the node. Actu- 
ally, the costs play the role of an external potential V(i) and low-potential paths 
are favored. If there is no reason to bias the paths with respect to some features, 
costs are simply set equal to 1 (paths are penalized by their length) or equal to 
dj = 1/ ciij (the elements of the adjacency matrix can then be considered as 
conductances and the costs as resistances). 

3.2. A Boltzmann distribution on the set of paths 

The present section describes how the probability distribution on the set of 
paths is assigned. To this end, let us first choose two nodes, a starting node i 
and an ending node j and define the set of paths (including cycles) of length t 
connecting these two nodes as Vij(t) = {pij(t)}. Thus, Vij(t) contains all the 
paths pij (t) allowing to reach node j from node i in exactly t steps. 

Let us further denote as c(pij(t)) the total cost associated to path pij(t). 
Here, we assume that pij (t) is a valid path from node i to node j, that is, every 
Cfe T _!fc T 7^ oo along that path containing the sequence of nodes (k = i) -> 
fci -> &2 (fct = j)- As already mentioned, we assume that the total 

cost associated to a path is additive, i.e. c(pij(t)) — J2 t T =i c k T - 1 k T where k = i 
is the starting node and k t = j is the ending node while t is the time (number 
of steps) needed to end the path in node j. 

In addition, let us define the set of all i-length paths through the graph 
between all pairs of nodes as V{t) = UijPij(t). Finally, the set of all bounded 
paths up to length t is denoted by V(<t) = \j\.= V{t). 

Now, a probability distribution on this finite set V(< t), representing the 
probabilities of picking a path p € "P(< i) in the bag-of-up-to-i-length-paths, 
is defined as the probability distribution P(p) minimizing the total expected 
cost-to-go, E[c(p)], among all the distributions having a fixed relative entropy 
Jo with respect to a reference distribution, for instance a natural random walk 
on the graph [50]. This choice naturally defines a probability distribution on 
the set of paths of maximal length t such that high-cost paths occur with a low 
probability while short paths occur with a high probability. In other words, 
we are seeking for path probabilities, P(p), p € V(< t), minimizing the total 
expected cost subject to a constant relative entropy constraint: 



where P re (p) represents the probability of following the path p when walk- 
ing according to the reference distribution (natural random walk), i.e. using 
transition probabilities p*f of the natural random walk on G (see Equation 
(1)). More precisely, if path p of length t consists of the sequence of nodes 
feo -> fci fc t/ w e define 7r ref (p) = P/^-ifc^' tnat tne P r °duct 

of the transition probabilities along path p - the likelihood of the path when 




pdV(<t) 



(2) 



subject to E P ev(<t) P(p) ln(P(p)/P ref (p)) = Jo 
E pe p(<t)P(p) = i 



7 



the starting and ending nodes are known. Now, if we assume a uniform a 
priori probability, 1/n, for choosing the starting and the ending node, then 
P ref (p) = ^" ref (p)/ J2 P 'ev(<t) ^ ref (p')/ which ensures that the reference proba- 
bility is properly normalized 1 . 

Here, Jo > is provided a priori by the user, according to the desired degree 
of randomness he is willing to concede. Minimizing the following Lagrange 
function 



2 = £ P(P)5(P)+A 

per(<t) 



pEP(<t) 



P re '(p) 



£ p(p) 

per(<t) 



over the set of path probabilities P(p) by taking the partial derivative with 
respect to P(p') [50] yields a Boltzmann probability distribution on the set of 
paths up to length t: 



P ref (p)e X p[-0c(p)] 



P(P) 



E 

o'ev(<t) 



P ref (p') 



exp [-0c(p')] 



(3) 



where the Lagrange parameter A plays the role of a temperature T and = 1/ A 
is the inverse temperature. 

Thus, as expected, short paths p (having a low cost c(p)) are favored in that 
they have a large probability of being followed. Indeed, from Equation (3), we 
clearly observe that when 9 — » 0, the paths probabilities reduce to the probabil- 
ities generated by the natural random walk on the graph (characterized by the 
transition probabilities p"f as defined in Equation (1)). In this case, J -> as 
well. On the other hand, when 6 is large, the probability distribution defined 
by Equation (3) is biased towards low-cost paths (the most likely paths are the 
shortest ones). Notice that, in the sequel, it will be assumed that the user pro- 
vides the value of the parameter 6 instead of Jo, with 6 > 0. Also notice that 
the model could be derived thanks to a maximum entropy principle instead 
[33,37]. 

In the next section, this idea will be generalized to unbounded paths by 
taking the limit t — > oo. 



3.3. The bag-of-paths probabilities 
3.3.1. Bounded paths 

The previous result (Equation (3)) assigns a probability distribution on all 
the possible paths of the graph, up to length t. We now consider a bag of 
such paths with a probability of picking a particular path p being provided by 
Equation (3). 

Our BoP framework will be based on the computation of another important 
quantity derived from Equation (3): the probability of picking a path starting 



1 We will see later that the path likelihoods 7r ref (p) are already properly normalized in the case 
of unbounded hitting, or absorbing, paths: X^'g-ph 7r ref (p') = 1. 
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in some node s = i and ending in some other node e = j, which is provided by 



53 P ref (p)exp[-05(p)] 

peVij(<t) 



53 P ref (p')exp[-^(p')] " 

p'ev(<t) 

53 exp[-#5(p)+log* ref 

peVij{<t) 

53 cxp[-05(p')+log^ ref (p')] 
P'ev(<t) 



53 7r ref (p)exp[-#5(p)] 
pev tJ (<t) 

53 ti**bpf)exp[-9c(ipt)] 

p'£V{<t) 



(4) 



with Vij(< t) being the set of paths of length up to t starting in node i and 
ending in node j. These paths can contain loops and could visit nodes i and j 
several times during the trajectory 2 . This quantity simply computes the prob- 
ability mass of picking a path connecting i to j divided by the total mass of 
probability. 

Now, the analytical expression allowing to compute the quantity defined 
by Equation (4) will be derived in this subsection. Then, in the following sub- 
section, its definition will be extended to the set of paths of arbitrary length 
(unbounded paths) by taking the limit t -> oo. 

Let us find the analytical closed form of Equation (4). We start from the cost 
matrix, C, from which we build a new matrix, W, as 



W = P ret o cxp [-9C] = exp 



-6C + lnP ref 



(5) 



where P ref is the transition probability matrix (do not confuse P ref in bold with 
P ref (p) representing the reference probability of path p) of the natural random 
walk on the graph containing the pf^, and the logarithm /exponential functions 
are taken elementwise. Moreover, o is the elementwise (Hadamard) matrix 
product. Notice that the matrix W is not necessarily symmetric. 

Then, let us first compute the numerator of Equation (4). Since all the quan- 
tities in the exponential of Equation (4) are summed along a path, lri7r ref (p) = 
Z)t=i ln Pfe e T f _ifc T ajnd = J2l=i c k T -ik T where each link fc T _! -> k T lies on 
path p, we immediately observe that element of the matrix W T (W to the 
power t) is \W T ]ij = J2 pe v tj (r) ex P[-#c(p) + lri7r ref (p)] where 7\,-(t) is the set 
of paths connecting the starting node i to the ending node j in exactly r steps. 

Consequently, the sum in the numerator of Equation (4) is 



53 7f ref (p) exp [-0c(p)]= 53 53 ^r ref (p)exp[-^(p)] 

p£Tij(<t) T=OpeVij{T) 



T=0 



Lr=0 



\T=0 



(6) 



2 Notice that another interesting class of paths, the hitting, or absorbing, paths - allowing only 
one single visit to the ending node j - will be considered in the next section. 
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and, by convention, at time step 0, the random walker appears in node i with 
a unit probability and a zero cost: W° = I. This means that zero-length paths 
(without any transition step) are allowed in Vij(< t). If, on the contrary, zero- 
length paths are dismissed, we would have instead J2 P ev zj (<t) ^ ref (p) ex P \~^{p)\ 
e I(2t=i W*)ej where Vij{< t) is the set of nonzero-length bounded paths 
from i to j - the initial time step is t = 1, i.e. all paths have at least a length 
equal to 1 (one transition step). This alternative convention will prove useful 
in Section 3.4 introducing the bag-of-hitting-paths model. 

This previous Equation (6) allows to derive the analytical form of the prob- 
ability of picking a bounded path (up to length t) starting in node i and end- 
ing in j. Indeed, replacing Equation (6) in Equation (4), and recalling that 
V{< t) = \jy j=1 Vij(< t), we obtain 




\T=0 



(7) 



i,j=l \t=0 / \t=0 



which allows to compute the probability of choosing a path starting in i and 
ending in j from the W matrix. 

Of course, there is no a priori reason to choose a particular path length; we 
will therefore consider paths of arbitrary length in the next section. 

3.3.2. Paths of arbitrary length 

Let us now consider the problem of computing the probability of picking 
a path starting in i and ending in j from a bag containing paths of arbitrary 
length, and therefore containing an infinite number of paths. Following the 
definition in the bounded case (Equation (4)), this quantity will be denoted as 
and defined by 

^ ref (p)exph^(p)] 

P(s = i,e = j) = = lim P^*> (s = i,e = 3) (8) 

]T^ e V)exph05(pO] 

P'ev 

where Vi-j is the set of all paths connecting i to j in the graph and the denomi- 
nator is called the partition function, 

Z= ^^ ef (p)exph^(p)] (9) 

p£T> 

The quantity P(s = i, e = j) in Equation (8) will be called the regular bag-of- 
paths probability of picking a path with arbitrary length starting from node i 
and ending in node j. Now, from Equation (7), we need to compute 



P(s = i, e = j) = lim P ( ^ 4) (s = i,e = j) = lim ^ { — (10) 

t— >oo t— >oo 

3 T 



e 1 



.T = 

/ t 

E WT 



Vt=0 
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We thus need to compute the power series of W 

t CO 

lim J2 WT = E W * = ( X _ W ) _1 



(11) 



r=0 



t=0 



which converges if the spectral radius of W is less than 1, p(W) < 1. Since 
the matrix W only contains non-negative elements, a sufficient condition for 
p(W) < 1 is that all its row sums are less than 1, which is always achieved for 
8 > since the Cjj > (see Equation (5)) - in that case, W is substochastic. 
Therefore, the limit always exists provided that 9 > 0, which is assumed for 
now. Now, if we pose 

z = (i-wr\ (12) 

with W given by Equation (5), we can pursue the computation of the numera- 
tor of Equation (10), 



ej J2 Wt e i = e * P - W r^j = eJZe, = [Z]« = Zij 



(13) 



where z^ is element i, j of Z. By analogy with Markov chain theory, Z will be 
called the fundamental matrix [38]. Elementwise, following Equations (6-13), 
we have that 



z a = E * ref (p) ex P [-^(e)] = [( x - w ) _1 



(14) 



t=o 



From the previous equation, z {j can be interpreted as 

n 

n n 

+ E E ^ ecui e ~ eci1 ' 2 e ~ 9Ci " + ■■■ ( 15 ) 

tl=l »2 = 1 

On the other hand, for the denominator of Equation (8) and (10), we find 



2 = ^^f ef (p)exp[-%)]=^ 



i,j=i |_*=o 



E w ' 



= e 



£w* J e = e T Ze = z.. 



(16) 



where z. # is the partition function and is denoted by Z = z„. Therefore, from 
Equation (10), the probability of picking a path starting in i and ending in j in 
our bag-of -paths model is simply 

F(s = i,e = j)= ^r, with Z = (I — W) _1 and Z = z„ (17) 



or, in matrix form, 



n = — , withz = (i - w) _1 

Zmm 



(18) 
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where II, called the bag-of-paths probability matrix, contains the probabilities 
for every starting-ending pair of nodes. 

An intuitive interpretation of the elements Zij of the Z matrix can be pro- 
vided as follows [63, 50]. Consider a special random walk defined by the 
transition-probabilities matrix W. Since W has each row sum less than one, 
the random walker has a nonzero probability of disappearing at each node i 
and each time step which is equal to (1 — Y^j=i w ij)- From Equation (5), it 
can be observed that the probability of surviving during a transition i -> j is 
proportional to exp[—6cij]. This interpretation makes sense: there is a smaller 
probability to survive edges with a high cost. In this case, the elements of the 
Z matrix, = ["Z]ij, can be interpreted as the expected number of passages 
through node j (see for instance [21, 38]) for an "evaporating" or "killing" ran- 
dom walker starting in node i and ending in j. 

3.4. The bag-of-hitting-paths probabilities 

The bag-of-hitting-paths model described in this section is a restriction of 
the previously introduced bag-of-paths in which the ending node does not ap- 
pear more than once - at the end of the path. In other words, no intermediate 
node on the path is allowed to be the ending node j, thus prohibiting looping 
on this ending node j. Technically this constraint will be enforced by making 
the ending node absorbing, exactly as in the case of an absorbing Markov chain 
[21, 31, 38, 56]. We will see later in this section that this model has some nice 
properties. Two ways of defining hitting paths probabilities are presented in 
this section. 



Computation of the hitting paths probabilities 

Vfj will be the set of hitting paths starting from i and stopping once node 



j has been reached for the first time (j is made absorbing). Let V h = UijV, 



be the complete set of such hitting paths. Following the same reasoning as in 
previous subsection, from Equation (8), when putting a Boltzmann distribution 
on V h , the probability of picking a hitting path starting in i and ending in j is 

]T ^ ref (p) ex P [-ec(p)] Yl ¥el ^ ex P \~ 6 ~<p)\ 

Ph(s = i, e = j) = -= = (19) 

^ re V) exp [-0c(p')] Zh 

p'£V h 

and the denominator of this expression is also called the partition function, = 
J2pev h ^ ref (p) ex P [ — #c(p)], but for hitting paths this time. The quantity Ph(s = 
i, e = j) will be called the bag-of-hitting-paths probability of picking a hitting 
path starting in i and ending in j. Notice that in the case of unbounded hitting 
paths, one can show that the P ref is properly normalized, i.e., J2 P ev h (p) = 1 
and P ref = 7r ref if we assume a uniform reference probability for picking the 
starting and ending nodes. 

Obviously, even if we adopt the convention that zero-length paths are al- 
lowed, paths of length (number of steps) greater than starting in node i and 
ending in the same node i (i.e., i = j) are prohibited and do not contribute 
to the sum - in that case, the zero-length path is the only allowed path start- 
ing and ending in i and we set its 7r ref equal to 1 (only one possible path). 
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Now, following the same reasoning as in the previous section, the numerator 
of Equation (19) is 

53 ^ ref (p) exp [-ec(p)} = ej ( £(W(--')) t J ej = e?(I - W^VS 

= eJZ^ ej = z<7'> (20) 

where W^ - ^ is now matrix W of Equation (5) where the jth row has been set 
to T and Z^') = (I - W^'))" 1 . This means that when the random walker 
reaches node j, he stops his walk and disappears. This matrix is given by 
— W — ej(wp T with = col,(W T ) = W T e J being a column vector 
containing the jth row of W. 

Alternatively, if we dismiss the paths of zero length (without any transition 

step), so that "P^ is the set of hitting paths from i to j without taking zero-length 
paths into consideration, 



53 ^(p) exp [-6c{p)\ = ej I ffWH))' ) ej = ej((l - W^))- 1 - I)e, 



= e T(Z<-'> - I)e, = z£» Sa (21) 
Finally, if zero-length paths are not allowed, 

J2 ^ ref (p) exp [-e~c{p)\ J2 ex P 1- 0£ (P)] 
P h (s = i,e = j) = = = (22) 

^ TT^(p) CXp [-fc(p)] 2 h 



where F^(s ~ i, e = j) (with an overline) denotes the bag-of-hitting-paths prob- 
abilities excluding zero-length paths. Similarly, the set of nonzero-length paths 
is V. Zero-length hitting path (i.e. paths including zero-length paths) probabil- 
ities are denoted by Ph(s = i, e = j), as before. Both conventions (including 
and excluding zero-length paths) will prove useful in the sequel. 

Now, all the entries of Z^- 7 ' can be computed efficiently by the Sherman- 
Morrison formula in terms of the fundamental matrix Z = (I — W) _1 (see [82] 
for a related development), providing (see Appendix Appendix A for details): 



z. 



<r'> = [z<-'->] y = ^ (23) 



ij i M 



Using this result, Equations (20) and (21) can be developed as follows. For 
Equation (20), we obtain 



53 * ref (p) exp [-ec(p)] = z£ j) = (24) 
per* 33 



and for Equation (21), 



53 ^ ref (p) exp [-e~c(p)\ = e T(Z<-'> I)ej = ^ - (25) 



°33 
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Set of paths 


Probability distribution 


Description 


V 


P(s = i,e = j) 


Regular bag-of -paths probability based on non- 
hitting paths and including zero-length paths. 




P h (s = i,e = j) 


Bag-of-hitting-paths probability including 
zero-length paths. 


V 


P(s = i,e = j) 


Bag-of-paths probability based on non-hitting 
paths, but excluding zero-length paths. 


v h 


P h (s = i,e = j) 


Bag-of-hitting-paths probability excluding zero- 
length paths. 



Table 1: The different bag-of-paths probability distributions differing in whether 
zero-length paths are allowed or not and whether the paths are hitting paths (the 
ending node is absorbing) or not. 



The matrix containing the elements Zij/zjj will be called - the funda- 
mental matrix of hitting paths - and, from the previous equation (24), is given 
by Z h = ZD^ 1 with D h = Diag(Z). The elements of the matrix Z h will be 
denoted as and, from Equation (24), are given by 



h _ 

Z ij ~ y . . 



5> ref (p)exph^(p)] 



(26) 



The diagonal elements of Z h are equal to 1, z\ = 1. We immediately deduce 
the bag-of-hitting-paths probability including zero-length paths, 



P h (s = i,e = j) = 



£ * ref (p)ex P [-0£(p)] 



Zij/ 'Zj 



Zij/ 'Zj 



J2 E * re V)exp[-0c(p')] 



E ( Z i'3'/ Z 3'f) 
i',j'=l 

(27) 

where the denominator of Equation (27) is the partition function of the hitting 
paths model, 



4=X! E * ref (p) exp [-0S(p)] = J2 [Zijhjj) 

» i.7=l p€Vh *,J = 1 



(28) 



In matrix form, denoting by 11^ the matrix of hitting paths probabilities 
Ph(s = i, e = j) with zero-length paths, 



n h = * , with Z = (I - W)- 1 and D h = Diag(Z) (29) 
e'ZD, e 



The algorithm computing the matrix II h is shown in Algorithm 1. 
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Algorithm 1 Computing the bag-of-hitting-paths probability matrix of G. 
Input: 

- A graph G containing n nodes. 

- The n x n adjacency matrix A associated to G, containing affinities. 

- The n x n cost matrix C associated to G. 

- The inverse temperature parameter 0. 
Output: 

- The n x n bag-of-hitting-paths probability matrix n h with zero-length paths in- 
cluded containing the probability of picking a path starting in node i and ending in 
node j, when sampling paths according to a Boltzmann distribution. 

- The n x n bag-of-hitting-paths probability matrix n h with zero-length paths ex- 
cluded. 

1. D <— Diag(Ae) {the row-normalization matrix} 

2. P ref <— D _1 A {the reference transition probabilities matrix} 

3. W P ref o exp [-6C] {elementwise exponential and multiplication o} 

4. Z ^— (I — W) _1 {the fundamental matrix} 

5. D h ^— Diag(Z) {the column-normalization matrix for hitting probabilities} 

6. Zh <— ZD^ 1 {column-normalize the fundamental matrix} 

7. Zh <— e T Zhe {compute normalization factor} 

8. Ilh <s— — {the bag-of-hitting-paths probability matrix with zero-paths included} 

-Zh 

9. Zh «— ZD^ 1 — I {column-normalize the fundamental matrix and substract zero- 
paths contribution} 

10. Zh <— e T Zhe {compute normalization factor} 

11. Ilh =- {the bag-of-hitting-paths probability matrix with zero-length paths ex- 

Zh 
eluded} 

12. return n h , n h 



On the other hand, if we dismiss zero-length paths, from Equation (21), 

* iei (p)exp[-6c(p)} 



P h (s = i, e = j) = 



—L _ x. 



n n/ 

Y, E * re V)exp[-0c(p')] E ?^ 



(30) 

and we obtain, for the bag-of-hitting-paths probability matrix with zero-length 
paths excluded, 

Hh = i ^ ■ with Z = ( X - w r' and D h = Diag(Z) (31) 

e ( ZD h -!) e 

We now briefly discuss a second way for deriving the hitting path proba- 
bilities. 



An alternative derivation of the hitting paths probabilities 

This result can also be understood intuitively as follows (see [26] for a sim- 
ilar argument). Each non-hitting path py e V%j (either including or excluding 
zero-length paths - the argument holds in both cases) can be split uniquely 
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into two sub-paths, before hitting node j for the first time, p^- <G Vfj, and af- 
ter hitting node j, p ]3 - e V 33 . These two sub-paths can be chosen indepen- 
dently since their concatenation is a valid path, with p^- o pjj e V%j being 
the concatenation of the two paths. Now, since c(p i3 ) = c(p^) + c{p 33 ) and 
K re Kpij) = ^ re Kp%W ei {pj]) for any p i3 = p\- o p jjr we easily obtain 

z y = E ^ ref (^) ex P[ _ ^ 5 (Pij)] 

pij GzPij 

= E ^ e \p%W e \Pn)oM-0KP%)]^M~0~c{p J3 )] 

= E E ^\p%)*M-dKP%W e \Pn)zM-0~c{Pn)] 

]T * lei (pl)eM-0£(P%)] \ [ E * Tet {Pn)eM-0~c{Pii)] 
4^ (32) 



and therefore z\ = Zij/zjj. Now, since P h (s = i,e = j) = 4/E",j'=i 4/)/ 
Equation (27) follows. 

An intuitive interpretation for the z\ 3 - 

In this section, we provide the intuition behind the elements of the hitting 
paths fundamental matrix, Zh- Let us consider a particular random walk with 
absorbing state k on the graph G whose transition probabilities are given by 
Pij = Pif ex P[~ ® c ij] = w »j when i 7^ fc and -p\ - = otherwise. In other words, 
the node k is made absorbing - it corresponds to hitting paths with node k as 
hitting node. When the walker reaches this node, he stops his walk and disap- 
pears. Moreover, since the cxp[— Qc i3 \ < 1, the matrix of transition probabili- 
ties p\ is substochastic and the random walker has also a nonzero probability 
(1 - Y^=iP}j) or disappearing at each step of its random walk and in each 
node i for which (1 — X)"=i p\j) > 0- This stochastic process has been called an 
"evaporating random walk" in [63] or an "exponentially killed random walk" 
in [70]. The transition probabilities p\ 3 - are collected in the substochastic transi- 
tion matrix P k , which is equal to W (Equation (5)) except its kth row which is 
full of 0's. 

Now, let us consider column k (corresponding to the hitting, or absorbing, 
node) of the fundamental matrix, colfe(Z) = Ze k - Since the fundamental matrix 
is Z = (I - W)- 1 (Equation (12)), we easily obtain (I - W)(Ze fe ) = Ie k = e k . 
Or, written elementwise, 

fzik = ELi mjZjk, for every i ^ k 

[Zkk = l^j = l w k] z jk + 1 

When considering hitting paths instead, z\ k = 1 (see Equation (26)) and 
Wkj = for all j (node k is made absorbing) so that the second line of Equation 
(33) - the boundary condition - becomes simply z\ k = 1 for hitting paths. 
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Moreover, we know that z\ = z ik jz kk for any i ^ k. Thus, dividing the first 
line of Equation (33) by z kk and recalling that w^j = p\ 3 when i ^ k provides 

f 4 = E"=i P% z )k> for e^ry i £ k ^ 
[ z kk = 1 

But this is exactly the set of recurrence equations computing the probability 
of hitting node k when starting from node i (see, e.g., [38, 62, 74]). Therefore, 
the z\ represent the probabilities of reaching node k from node i, without dis- 
appearing during the evaporating random walk with transition probabilities 

Finally, since p}j = Wij when i ^ k and p\- does not appear in Equation 
(34), this equation can be rewritten as 



E"=i w *j z %' for every i ^ k 

: 1 



(35) 



which provides a recurrence formula for computing the z\ when fixing the 
hitting node k. 

4. Two novel distance measures between nodes based on the hitting paths 
probabilities 

In this section, two distance measures are derived from the hitting paths 
probabilities including zero-length paths 3 . Indeed, there have been recent ef- 
forts in order to design new families of distances between nodes of a graph 
[13, 82, 3]. The two distance measures defined in this section are in the same 
spirit, but the second one benefits from some nice properties that will be de- 
tailed. 

4.1. A first distance measure based on the associated surprisal measure 

This section shows that the associated weighted surprisal measure, - log P(s 
i,e = j), quantifying the "surprise" generated by the outcome (s = i) A (e = j), 
when symmetrised, is a distance measure. This distance a^- associated to the 
bag-of-hitting-paths is defined as follows 

~-i,e = j) + logP h (s = j, e = i) . . 

2 ^ J (36) 

if i = J 

where Ph(s = i, e = j), Ph(s = j, e = i) are computed thanks to Ph(s = i, e = 
j) = Zz3 / z Z ! j ,/z; ■/) ( see Equation (27)). Obviously, > 0, a\- is symmet- 
ric, and a\ = for all i. Moreover, A^ is equal to zero only when i = j. It is 
shown in Appendix Appendix B that this measure is a distance since it obeys 
the triangle inequality, in addition to the other mentioned properties. This dis- 
tance will be called the bag-of-hitting-paths surprisal distance. 




3 These results do not hold for bag-of-paths excluding zero-length paths. 
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4.2. A second distance measure based on the bag-of-hitting-paths 

4.2.1. Definition of the distance measure 

The second distance measure also relies on the result of Inequation (B.5) in 
Appendix Appendix B, and will be based on the quantity <j)(i,j) = — \ log z\j. 
Indeed, by recalling that z\j = z^/zjj (see Equation (26)) and P h (s = i,e = 
j) = Zij/(Z h zjj) = z^j/Zh (Equation (27)), we obtain from Equation (B.5) 

P h (s = i, e = k)> Z h P h (s = i,e = j) P h (s = j, e = k) 

_^ IK ^ £ 3 J 

Zh ~ z h z h 

1 11 

- -q !og z\ < - - log z\ - - log z) k 

The quantity 4>{i,j) = — \ logz^- will be called the potential [18] of node i 
with respect to node j. Indeed, it has been shown [27] that when computing 
the continuous-state equivalent of the randomized shortest paths framework 
[63], 4>{x, y) plays the role of a potential inducing a drift (external force) in 
the diffusion equation. 

For the sake of completeness, let us recall that (Equation (26)) zfj is given 
by z i 3 = J2 P ev h ^(p) ex P [-Qc(p)] = z ij/ z jj where Zij is element i,j of the 
fundamental matrix Z (see Equation (12)). From this last equation, (p(i,j) can 
be interpreted (up to a scaling factor) as the logarithm of the expectation of the 
reward exp[— Oc(p)] with respect to the reference probability, when considering 
absorbing random walks starting from node i and ending in j. 

Inequation (37) suggests to define a distance a\- = (<j>(i,j) + 4>(j,i))/2. It 
has all the properties of a distance measure, including the triangle inequality, 
which is verified thanks to Inequation (37). This distance measure can easily 
be expressed in function of the surprisal distance (see Equation (36)) as a\- — 
(a^j — logZ h )/# for i 7^ j. This shows that the newly introduced distance is 
equivalent to the previous one, up to the addition of a constant and a rescaling. 
The definition of the bag-of-hitting-paths potential distance is therefore 

r (t>(i,j) + 4>(j,i) 1 1 

4j = { 2 , where j) = - - log z% = - - log 

[0 Hi =j 6 \ z n 

(38) 

and is element i, j of the fundamental matrix Z (see Equation (12)). 

From Equation (29), it can be easily seen that the matrix Z h containing the 
z^j can be computed thanks to Algorithm 1 without the normalization steps 6 
and 7. The distance matrix containing the A^- is denoted as A h . It is computed 
in almost the same way as for A h , by means of Algorithm 2. Notice that if there 
are many different connected components, 

4.2.2. Some properties of the distance 

This potential distance A^ has three advantages over the surprisal distance 

a\ 
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Algorithm 2 Computing the bag-of-hitting-paths potential distance matrix of G. 
Input: 

- A graph G containing n nodes. 

- The n x n adjacency matrix A associated to G, containing affinities. 

- The n x n cost matrix C associated to G (usually, the costs are the inverse of the 
affinities, but other choices are possible). 

- The inverse temperature parameter 9. 
Output: 

- The n x n bag-of-hitting-paths potential distance matrix A. 4, containing the pair- 
wise distances between nodes. 

1. D <— Diag(Ae) {the row-normalization matrix} 

2. P ref ■(— D _1 A {the reference transition probabilities matrix} 

3. W <— P ref o exp [-0C] {elementwise exponential and multiplication 0} 

4. Z ^— (I — W) _1 {the fundamental matrix} 

5. Dh Diag(Z) {the column-normalization matrix for hitting probabilities} 

6. Zh «— ZD^ 1 {column-normalize the fundamental matrix} 

7. <& i log(Zh) /0 {take elementwise logarithm for computing the potentials} 

8. A<£ <— (<& + <J? T )/2 {symmetrize the matrix} 

9. A 4, «— A^ — Diag(A^) {put diagonal to zero} 
10. return A 



• The potential distance is graph-geodetic, meaning that A? k = a\- + 4j fe 

if and only if every path from i to k passes through j [13] (see Appendix 
Appendix C for the proof). 

• For an undirected graph G, the distance a\- recovers the shortest path 
distance when 9 becomes large, 9 — > 00. In that case, the Equation (38) 
reduces to the Bellman-Ford formula (see, e.g., [5, 16, 19, 35, 61, 68]) for 
computing the shortest path distance, = mhij eSucc ^(cij + A s ^ k ) and 
A kk = (see Appendix D for the proof). 

• For an undirected graph G, the distance a\- recovers half the commute 
cost distance when 9 becomes small, 9 — >• + (see Appendix E for the 
proof). The commute cost between node i and node j is the expected cost 
incurred by a random walker for reaching node j for the first time from 
node i and going back to node i. The recurrence expression for comput- 
ing the average first-passage cost is m ik = J2jeSucc(i) Pfj ( c u + m jk) with 
boundary condition m kk = (see, e.g., [38, 56, 62, 74]). The commute 
cost is then Af!j~ — (m,j + rnji). Notice that, for a given graph G, the com- 
mute cost between two nodes is proportional to the commute time be- 
tween these two nodes, and therefore also proportional to the resistance 
distance (see [39] 4 ). However, even if the potential distance converges 
to the commute cost when 9 -> + , we have to stress that 9 should not 
become equal to zero since the matrix W becomes rank-deficient when 
9 = 0. This means that the Equation (12) cannot be used for computing 
the commute cost when 9 is exactly equal to zero. Despite this annoying 



4 This can easily be shown from the formula computing the commute cost in terms of the Lapla- 
cian matrix derived in the appendix of [25]. 
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Topic Size Topic Size Topic Size 

G-2cl-A G-2cl-B G-2cl-C 

Politics /general 200 Computer/graphics 200 Space/general 200 

Sport/baseball 200 Motor /motorcycles 200 Politics/mideast 200 

G-3cl-A G-3cl-B G-3cl-C 

Sport/baseball 200 Computer/windows 200 Sport/hockey 200 

Space/general 200 Motor/autos 200 Religion/atheism 200 

Politics/mideast 200 Religion/general 200 Medicine /general 200 

G-5cl-A G-5cl-B G-5d-C 

Computer /windowsx 200 Computer /graphics 200 Computer/machardware 200 

Cryptography/general 200 Computer/pchardware 200 Sport/hockey 200 

Politics/mideast 200 Motor/autos 200 Medicine /general 200 

Politics/guns 200 Religion/atheism 200 Religion/general 200 

Religion /christian 200 Politics/mideast 200 Forsale/ general 200 



Table 2: Document subsets for clustering experiments. Nine subsets have been 
extracted from the full Newsgroup dataset, with 2, 3 and 5 topics as proposed in 
[81]. Each cluster is composed of 200 documents. 



fact, we found that the approximation is very accurate for small values of 



These three properties make the distance quite attractive. 



4.2.3. Relationships with the Bellman-Ford formula 

Moreover, it was shown in the Appendix Appendix D that the potential 
4>(i, j) can be computed through the following recurrence formula 



log 



j£Succ{i) 



-6( Cij + <p(j,k))] 







iti^k 



if i = k 



(39) 



which is an extension of Bellman-Ford's formula for computing the shortest 
path [5, 16, 19,35,61,68]. 

Indeed, the potential <l>(i,j) tends to the average first-passage cost when 
8 -> 0+ and to the shortest path cost when 6 -> oo. This formula is a generaliza- 
tion of the distributed consensus algorithm developed in [72], and considering 
binary costs only. 



5. First experiments on a semi-supervised classification task 

This experimental section aims at investigating the potential of the bag- 
of-hitting-paths distances and their derived kernels as similarity measures be- 
tween nodes. For that purpose, kernels associated with the introduced dis- 
tance measures are assessed in a semi-supervised task and are compared with 
other competitive techniques. Additional experiments on other data sets are 
currently ongoing and will be reported when completed. 
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5.1. Dataset description 

Comparison of the different methods will be performed on the well-known 
real world Newsgroups dataset 5 . This dataset is composed of 20000 text docu- 
ments taken from 20 discussion groups of the Usenet diffusion list. The ex- 
periment objectives being to correctly recognize topic clusters, nine subsets 
related to different topics are extracted from the original dataset, as listed in 
Table 2 [81]. Each subset is composed of 200 documents extracted randomly 
from the different newsgroups. The subsets with two classes (G-2cl-A,B,C) 
contain 400 documents, 200 in each class. Identically, subsets with three classes 
contain 600 documents and subsets with five classes contain 1000 documents. 
Each subset is composed of different topics, each of which are either easy to 
separate (Computer / windowsx and Religion/ christian) or harder to separate 
(Computer /graphics and Computer /pchardware). These data sets are used 
throughout all experimental settings. 

Initially, this dataset does not have a graph structure but is composed of a 
feature space (terms) of high dimensionality. To transform this dataset into a 
graph structure, a fairly standard preprocessing has been performed, which is 
directly inspired by the paper of Yen et al. [81]. 

Basically, the first step is to reduce the high dimensionality of the feature 
space (terms), by removing stop words, applying a stemming algorithm on 
each term, removing too common or uncommon terms and by removing terms 
with low mutual information with documents. Secondly, a term-document 
matrix W is constructed with the remaining terms and documents. The ele- 
ments Wij are tf-idf values [48] of term i in document j. Each row of the term- 
document matrix W is then normalized to 1. Finally, the adjacency matrix 
defining the links between documents is given by A = W T W. 

5.2. Compared distances and algorithms 

This paper derived distance measures from the bag-of-paths probabilities. 
In order to apply these distances to unsupervised and supervised learning 
methods, it is convenient to transform them into similarity matrices, called 
here kernels, for simplicity. From [7, 20], a centered kernel matrix K can be 
derived from a squared distances matrix A' 2 ) as follows 

K= *HA (2) H (40) 

where H = (I ee T jn) is the centering matrix. However, the obtained ker- 
nels are not necessarily positive semi-definite, a requirement for some unsu- 
pervised or supervised kernel methods. This problem can be fixed by remov- 
ing the negative eigenvalues (A, < 0; see, e.g., [52]). Now, although the kernels 
are not positive semi-definite, we use them as such, as we did not notice any 
significant difference in the experiments when considering only the positive 
eigenvalues of the kernels. The modularity kernel has been used for semi- 
supervised learning earlier by Zhang et al. [85] with the difference that they 
only preserve some of the largest positive eigenvalues for their kernel. 
The following list presents the kernels compared in this paper: 



5 Available from http: / / people.csail.mit.edu/jrennie/20Newsgroups/ . 
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• The kernel associated to the bag-of-hitting-paths potential (K Bo pp) (Equa- 
tions (38) and (40)). 

• The kernel associated to the bag-of-hitting-paths surprisal distance (K Bo ps) 
(Equation (36)) and (40)). 

In addition, three state-of-the-art graph kernels and the standard modular- 
ity matrix are added to this list and compared to the previous ones. 

• The standard modularity matrix based on the bag-of-links model (Q) 
used by Zhang et al. [85]. 

• The kernel introduced in Zhou and Scholkopf (Zhou) [88]. This technique 
will be considered as a first baseline method for the semi-supervised ex- 
periments. 

• Finally we will also report the results obtained by the 2?-walks method 
(P-walks) [9], as a second baseline in the semi-supervised experiments. 

All the above kernels and methods will be compared on same experimental 
settings, described hereafter. For illustration, a picture of some of the kernels 
is shown in Figure 1 . 

5.3. Experimental settings 

In this experiment, we address the task of classification of unlabeled nodes 
in partially labelled graphs. Notice that the goal of this experiment is not to 
design a state-of-the-art semi-supervised classifier; rather it is to study the per- 
formances of the proposed measures, in comparison with other state-of-the-art 
methods in graph-based semi-supervised classification. The kernels that are 
compared are the same as in the clustering experiments. 

The method is directly inspired from [73]. It consists of two steps: (1) ex- 
tracting the latent social dimensions which may be done using any matrix de- 
composition technique or by using a graphical topic model. Here, we used, as 
in [73], a simple Singular Value Decomposition. More precisely, we extracted 
the top eigenvectors of the suggested kernel measures, which can be any of 
those introduced in this paper (see Section 5.2). (2) training a classifier on the 
extracted latent space. In this space, each feature corresponds to one latent 
variable (i.e. one of the top eigenvectors). The number of social dimensions has 
been set to 5 for all the suggested measures and the classifier is a one-vs-rest 
linear SVM. We also tested different numbers of social dimensions [10, 50, 500] 
but the performances did not change significantly - these results are therefore 
not reported here. 

The classification accuracy is reported for a labeling rate of 10%, i.e. pro- 
portions of nodes for which the label is known. The labels of remaining nodes 
are removed and used as test data. For this considered labeling rate, a strat- 
ified 10-fold cross-validation was performed, on which performances are av- 
eraged. For each fold of the external cross-validation, a 5-fold internal cross- 
validation is performed on the remaining labelled nodes in order to tune the 
hyper-parameters of each classifier (typically the 6 = [0.01, 0.1, 0.2, 0.3,... ,1.0] 
for the bag-of-paths based approaches and the C = [0.001,0.01,0.1, 1, 10, 100,1000] 
parameter for the SVM). In the case of the T>- walks approach, the number of 
steps has been tuned using the same process. Then, performances on each fold 
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Figure 1: Images of the different similarity matrices, (a) K Bo pp, (b) K Bo ps and (d) 
Q, computed on the G-3cl-A dataset. Nodes have been sorted according to classes. 
We observe that classes are clearly visible in (a), (b) and (c). For the standard 
modularity (c), the class discrimination is less clear. 



are assessed on the remaining, unlabeled, nodes (test data) with the hyper- 
parameter tuned during the internal cross-validation. 

For each unlabeled node, the various classifiers predict the most suitable 
category according to the procedures described in the previous sections. We 
report, for each method, the average classification rate and its standard devia- 
tion obtained on the 10 folds of the cross-validation. 

5.4. Results and discussions 

Table 3 reports average classification rates and standard deviations on the 
nine Newsgroup datasets, for a proportion of 10% of labeling rate. It can be 
seen that the classification rate is generally better using the bag-of -paths based 
approaches (K Bo pp, K Bo ps) in comparison with other state-of-the-art methods. 
Indeed, both the Kb pp and the Kb ps consistently provide good, competitive, 
results. 

Notice also that the simple modularity, although below the best methods, 
especially in the 5-classes setting, provides reasonable results. 



23 



Classif Rates Kb pp 
Datasets 



Kbops P-walks Zhou Q 



G-2cl-A 
G-2cl-B 
G-2d-C 



95.11 ± 1.75 95.75 ± 2.06 91.72 ± 1.64 86.75 ± 2.69 95.30 ±1.09 
91.79 ± 1.77 92.27 ± 1.73 88.49 ± 2.24 79.84 ± 3.51 90.92 ±1.16 
96.29 ± 0.93 96.43 ± 0.81 94.31 ± 2.37 92.59 ± 1.10 95.60 ± 1.03 



G-3cl-A 
G-3cl-B 
G-3cl-C 



92.82 ±0.99 92.67 ± 1.36 90.55 ± 1.39 86.22 ± 2.58 93.48 ± 0.76 
92.12 ±2.08 92.18 ± 1.64 90.51 ± 1.75 82.05 ± 2.65 92.30 ±1.18 
92.27 ±2.25 92.46 ± 1.71 88.23 ± 1.59 80.30 ± 1.64 89.07 ±2.50 



G-5cl-A 
G-5cl-B 
G-5cl-C 



88.34 ±0.57 87.04 ± 2.21 84.42 ± 1.79 78.94 ± 2.02 77.91 ± 2.22 
79.66 ±2.11 80.17 ± 1.17 77.53 ±2.13 71.84 ± 1.45 65.69 ± 3.94 
78.30 ±2.23 76.15 ±2.56 78.14 ± 1.66 71.33 ± 2.39 65.98 ±4.83 



Table 3: Classification rate and standard deviation for the bag-of-paths based ker- 
nels, the modularity matrix, and two competitive methods (©-walks and Zhou) 
obtained on each dataset, using 5 social dimensions. Only the results for graphs 
with 10% labeling rate are reported. 



6. Conclusion 

This work introduced the bag-of-paths framework considering a bag con- 
taining the set of paths in the network. By defining a Boltzmann distribution 
on this set of paths penalizing long paths, we can easily compute various quan- 
tities such as distance measures between nodes or an extension of the modular- 
ity. Experiments have shown that the BoP framework can provide competitive 
algorithms within a clear theoretical framework. 

Indeed, the experiments showed that the bag-of-paths framework provides 
competitive results. As shown in the semi-supervised experiment, the kernels 
associated to the distance measures derived from the bag-of-paths probabili- 
ties achieves state-of-the-art performances. Consistency of performance across 
the different datasets shows that the bag-of-paths framework seems to induce 
some promising distance and similarity measures on graphs. Other experi- 
ments are currently carried on and will be reported when completed. 

Other quantities of interest can be defined within the BoP framework. For 
instance, a betweenness measure can be defined as P(int = j\s = i, e = k), the 
probability that a path starting in i and ending in k visits j as an intermediate 
node [45]. Another idea would be to reformulate the modularity matrix in 
terms of paths instead of direct links. Still another application would be the 
computation of a robustness measure capturing the criticality of the nodes. The 
idea then would be to compute the change in reachability between nodes when 
deleting one node within the BoP framework. Nodes having a wide impact on 
reachability are then considered as highly critical. Finally, we plan to evaluate 
experimentally the potential distance (see Equation (39)) as a distance between 
sequences of characters by adapting it to a directed acyclic graph, as in [26]. 

Appendix 

Appendix A. Computation of all entries of Z^~^ in terms of the fundamen- 



All the entries of Z^~^ can be computed efficiently in terms of the funda- 
mental matrix Z = (I — W) _1 . This is a simple application of the Sherman- 
Morrison formula (see, e.g., [28, 67]) for the inverse of the rank-one update of 



tal matrix 
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a matrix: if c and d are column vectors, 

< A+rfI >-' = (a.d 

Indeed, from W^ j ~> = W — e.,-(wp T , where = coL;(W T ) is row j of W 
viewed as a column vector, we have (I — W^- 7 ') = (I — W) + e^wp 1 . By 
setting A = (I - W), c = e.,- and d = in Equation (A.l), we obtain 

z(-^(I-W^))-^Z- i + ^ Zej (A.2) 

Let us first compute the term (wp T Z appearing both in the numerator and 
the denominator of the previous equation (A.2). Since Z = (I - W) _1 , (I - 
W)Z = I,and 

(w^) T Z = (wj - ej + e,) T Z = -(e, - w^) T Z + eJZ 

= -ej + (z^) T = (z^) T - e] (A3) 

where z^ = coL; (Z T ) is row j of Z taken as a column vector. 

From Equation (A. 3), the denominator of the second term in the right-hand 
side of Equation (A.2) is 

1 + K) T Z ej = 1 + ((z^) T - ej) ej = (z^) T e, = z n (A.4) 

Moreover, also from Equation (A. 3), the numerator of the second term in 
the right-hand side of Equation (A.2) is 

Ze J (w;:) T Z = z J c ((zp T -eJ) (A.5) 

where zj = coL; (Z) = Zej is column j of matrix Z. 

We substitute the results (A.4) and (A.5) in the denominator and the numer- 
ator of Equation (A.2), providing 



Z (-i) = z - z i« g P T - e ?) 



(A.6) 



Let us now compute the i, j entry of matrix Z( j K From previous Equation 
(A.6), 



(-i) T ( '\ T I Z 7 (( Z 7") T - e I) 



_ Zij ((^) T - e]) ej _ zjj (z 3J - 1) _ ZlJ 

— — z v j — , yss../ ) 

which provides a simple expression for computing the fundamental matrix for 
hitting paths. 
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Appendix B. Triangular inequality proof for the surprisal distance 

In order for a^- to be a distance measure, it remains to be shown that it 
obeys the triangle inequality, A^ k < a\- + A^ k for all i, j, k. Notice that a^- = oo 
when node i and node j are not connected (they belong to different connected 
components). In addition, note that the triangle inequality is trivially satisfied 
if either i = j, j = k or i = k. Thus, we only need to prove the case i ^ j ^ k ^ 
i. 

In order to prove the triangle inequality, consider the set of paths V ik , in- 
cluding zero-length paths, from node i to node k. We now compute the proba- 
bility that such paths pass through an intermediate node int = j when j ^ j ^ 

k^i, 

£ £(jep)* ref (p)exp[-0c(p)] 

P(s = *, int = j, e = k)= pev ' k (B.l) 

£ ^f(p') cxp [-6c{p>)] 
p'er 

where 5(j e p) is a Kronecker delta equal to 1 if the path p contains (at least 
once) node j, and otherwise. It is clear from Equations (19) and (B.l) that 

P(s = i, e = k) > P(s = i, int = j, e = k), for i ^ j ^ k ^ i (B.2) 

Let us transform Equation (B.l), using the fact that each path p ik between 
i and k passing through j can be decomposed uniquely into a hitting sub-path 
pij from i to j and a regular sub-path p jk from j to k. The sub-path p io is 
found by following path p ik until reaching j for the first time 6 . Therefore, for 



6 This is the main reason why we introduced hitting paths. 
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ii= i ^ k i= i, 



Y, S(j e p)^ ef (p)cxp[-05(p)] 



P(s = i, mi = j,e = k) = 

Z 

Y Y ^ tei (ptj)^ Tei (pjk)exp[-9d(p i:j )]exp[-ec(p jk ) 



Y ^ ref (Pu) ex P[-^c(p l:) )] 




Y ¥ei {p 0k )exp[-6Z{p jk )] 









= z h 



Y ^(pjk) exp [-9c{p jk )] 

Pjk&'Pjk 



Z h Z 
Zh PhO = i,e = j) P(s = j, e = fc), for i ^ j ^ k ^ i 



(B.3) 



Combining Inequation (B.2) and Equation (B.3) yields 

P(a = i,e = k)> Z h P h (s = i,e = j) P(a =j,e= fc), for i ^ j ^ k + i (B.4) 

Replacing the regular bag-of-paths probabilities by their expressions (see 
Equation (17)) in function of the elements of the fundamental matrix z iv P(s = 

i,e — fc) = z ik /Z and P(s = j, e = k) = z jk /Z in this last inequality and using 
Ph(s = i, e = k) = Zih/(Z h Zkk) (see Equation (27)) provides 

P(s = i, e = fc) > Z h P h (s = i,e = j) P(s = j, e = fc) 

=> f >2 h P h ( S = z, e = J )|^ 



■^h z kk ^h z kk 

=> P h (s = i,e = fc)>.Z h P h (s = i,e = j)P h (s = j,e = fc), (B.5) 

for i 7^ j 7^ fc 7^ i Now, from Equation (28) and the fact that > 0, it is clear 
that Z\y > 1; thus 

P h (s = i, e = fc) > P h (s = i, e = j) P h (s = j, e = fc), for i ^ j ^ k ^ i (B.6) 
Finally by taking - log of inequality (B.6), we obtain 

-logPhO = i,e = fc) < -logP h (s = i,e = j) - logP h (s = j,e = fc), (B.7) 

for i 7^ j 7^ fc 7^ «. Thus, the surprisal measure, — logP h (s = i, e = j), obeys 
the triangle inequality. Therefore the distance A^- = — (logP h (s = i,e = j) + 
logP h (s = j, e = i))/2 also enjoys this property. 
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Appendix C. Proof of the geodetic property of the potential distance 

From the definition of the bag-of-paths probability (Equation (8)), as well 
as Equation (B.l) defining P(s = i, int = j, e = k), we have for i ^ j ^ k 7^ i 

J2 ^ ref (p)cxp[-05(p)] 



Z 

E W G p) n tel (p) cx P [-ec(p)] J2 (! - e p)) ^ ref (p) exp [-0c(p)] 



Z 

E ^^P)* ref (p)exp[-^(p)] 
= P(s = i, int = j, e = k) + peV%k (C.l) 

Now, substituting P(s = i, int = j, e = k) by ZhPh(s = i, e = j)P(s = j, e = 
fc) (see Equation (B.3)) in the previous equation yields 

P(s = i, e = fe) =Z h P h (s = i,e = j)P(s = j, e = k) 

E <5(.7^P)^ ref (p)exph^(p)] 
+ ^ (C.2) 



Now, recalling that P(s = i,e = fc) = z ik /Z (Equation (17)) and Ph(s = 
i, e = j) = zfj/Zh (Equation (27)) , we transform Equation (C.2) into 

peVik 

Dividing both sides of the previous equation by z kk and recalling that z\ = 
Zik I Zkk (Equation (26)) provides 



+ — E w i *>) ^ ref (p) cx p ( c - 4 ) 

J J 7.1-1. ' " 



kk p€V ik 



and we recover z\ > z\^ k (Equation (37)). The equality z\ = z^z^ k (i ^ j ^ 

k 7^ i) holds if and only if J2 P £V tk ^C? ^ f) ^ ref (p) ex P [ — ^c(p)] = 0, which only 
occurs when all paths connecting i and k pass through node j. Thus, it is clear 
that Aj k = a\- + A^ k , i 7^ j 7^ fc 7^ z if and only if all paths p e Vf k connecting the 
source node i and the destination node k pass through node j. This property 
is called the graph-geodetic property in [13]. 



Appendix D. Asymptotic result: for an undirected graph, the a^ distance 
converges to the shortest path distance when 6 — y oo 

There are two ways to prove this property, each of them having its own ben- 
efits. The first proof is based on the bag-of-paths framework and is shorter. The 
second proof is largely inspired by [72] and is longer, but establishes some in- 
teresting links with the Bellman-Ford formula for computing the shortest path 
in a network (see, e.g., [5, 16, 19, 61, 68]). 
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Appendix D.l. First proof 

Let us recall (Equation (38)) that, assuming i ^ j, a\ 
<j>(i,j) = —jj log zfj, and where is given by (Equation (26)): 

4 = E* ref ^) ex p[-^)i 



with 



(D.l) 



We now have to compute the asymptotic form of z^- for 9 — > oo. Let the 
lowest-cost paths from i to j be denoted as p* k and let c* = c(p* k ) be the cost 
of such a lowest-cost path, c* is therefore the minimum cost among all pos- 
sible paths from i to j. Say there are m such lowest-cost paths. Now, since 
X)pe7>!v ^ ref (p) = 1/ ^ i s clear that z^- is bounded by 

4 < E * ref (p) ex p [-^i = cx p M c i E ^ ref (p) = cx p \- 9c *\ (°- 2 ) 

and is therefore finite. We also observe that it converges exponentially to 
when 9 — > oo. We can now rewrite 

4 = E ^ ref (p) ex p i-^(p)] = cx p [-* c i E * ref (p) ex p - c *)i 



I 



exp [— 9 c* 



E 7ffef (^)+ E ^ ref (p)expM(S( 



i=i 



V 



c(p)>c* 



\ 



/ 



(D.3) 



Let us now compute the potential <j)(i,j) = — \ log 2^ when 6 -> oo. Using 
Equation (D.3) we get 



loe 



cxp [-0c*] 



= c * — 7i 



+ E ^ ef (p)exp[-0(r:(p)-c*)] 



c(p)>c* 



£* ref (p|) + E ^ ref (p)ex P M(5(p)- C *) 



i=i 



V 



c(«)>c* 



/ 



0— >oo 



4- C 



(D.4) 



Here, the last limit applies since the expression inside the logarithm is bounded 
(the first term is constant and the second decays to 0). 

Moreover, observing that, in the case of an undirected graph, the lowest 
cost from j to i is equal to the lowest cost from i to j (i.e., c*), the distance 
A <P_ _ 0(»,j)+<Kj» g ^°° > c * _ Therefore, the bag-of-hitting-paths potential dis- 
tance provides the shortest path distance when 9 -> oo. 
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Appendix D.2. Second proof 

The second proof starts from the definition of the fundamental matrix (Equa- 
tion (12)), Z = (I - W)~\ which can be rewritten as Z = WZ + I. Or, in 
elementwise form, 



Zik = w v z jk + s tk = ^Pfj exp[-0Cij] z jk + S lk 



(D.5) 



since, from Equation (5), 



plf exp[-6cij]. From zf k = z ik /z kk , we immedi- 



ately deduce that z\ = 1 when i = k and z\ k = Y^j=i Pif ex P[~^ c ij] ^i k when 
i 7^ k. Therefore, 



5>- f expH^]zh fc forz^fc 
1 for i = k 



(D.6) 



Let us now compute the value of the potential <j>(i, k) (Equation (38)) for 
i^k (when i = k, 4>(k, k) = 0), 



4>{i,k) = - J log 4k = -^log 



3=1 



log 



= -^log 



n 

Y,PfhM~ec l3 ] exp[-0(-- log ^)] 



^p^ f exp[-0c 4J ] exp[-6»0(j,fc)] 



^ ^. f exp[-0(c ij+ <Kj,fc))] 



(D.7) 



which provides a recurrence formula for computing </>(«, £;), together with the 
boundary condition 4>(k, k) = — | log (z kk /z kk ) — 0. 

Let us now study the behavior of this equation for 9 -» oo. We first observe 
that both the numerator and the denominator tend to +oo when 6 -> oo. For 
that purpose, in order to simplify the notations, we will study the function f$ = 
- l°g(Z)"=i 1] exp[-0xj])/9 with X)"=i 9j = 1 instead, with Xj = (c^ + k)) 
and gj = [72]. Let us further define x* = min^Xj) so that (xj — x*) > 0; we 
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then have 



lira fg = lim - 

0— >co 6—yoo 



log 



log 



J2 q 3 ex P[- fe j] 

i=i 



exp[— exp[— ^(a;j — a;* 




x* — lim 



= a; 



(D.8) 



and the last limit is since at least one of the Xj is exactly equal to x* so that 
the sum J2]=i 1j ex P[ — ®( x 3 ~ x *)\ is non-zero positive. 

Thus, when 9 -» oo, Equation (D.7) becomes <j>(i,k) = mirij (c,j + <f>(J,k)) 
for i ^ <; and <f>(k,k) = which is the well-known Bellman-Ford formula for 
computing the shortest path distance in an undirected graph [5, 16, 19, 35, 61, 
68]. Now, for an undirected graph, the shortest path from i to j is equal to the 
shortest path from j to i, which implies that A^ reduces to the shortest path too 
when 8 -> oo. 

We now show that the A^ distance is the commute cost distance when 9 -> 

0+. 



Appendix E. Asymptotic result: for an undirected graph, the a& distance 
converges to half the commute cost distance when 0—^0+ 

As before, there are two ways to prove this property. The first proof is based 
on the bag-of-paths framework and is shorter. The second proof, also inspired 
by [72], is longer, but establishes some interesting links with the recurrence 
formula computing the average first-passage cost in a network (see, e.g., [38, 
56,62,74]). 

Appendix E.l. First proof 

From Equations (38) and (D.l), 



(log 4 + log 4) 



26> 

lo g(E p6 7^ exp[-0c(p)]) + log(X) p g7^ W 
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i.ref ( 



lexp[-0c(p)]) 



(E.l) 
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and, since J2 peV h ^ ref (p) ™ 1/ both the numerator and the denominator tend 
to zero when 9 — > + . For taking the limit 9 — > + of the whole expression 
(E.l), we apply l'Hospital's rule (taking the derivative of the numerator and 
the denominator with respect to 9 and then the limit lim e ^ + of the resulting 
expression), which provides 

lim = r ' J ^ (E.2) 

The quantity 2~2 P ev h . ^ ref (p) c(p) can be interpreted as the average first- 
passage cost from i to j, i.e. the average cost undergone by a random walker 
using transition probabilities pFj for reaching destination node j for the first 
time when starting from i. Consequently, the average of the two quantities 
defined in (E.2) is half the commute cost distance. 

Appendix E.2. Second proof 

Restarting from Equation (D.7), we now have to take the limit 9 — > + . 
Assuming E™=i Qj = 1/ l et us compute the limit 9 -> 0+ of instead of -> oo 
in Equation (D.8) and apply l'Hospital's rule 



lim fe = lim 

0^0+ ' e->o+ 



log ( exp[-6xj] 

V =1 



= ^ = ^ (E.3) 



0->O+ 



Therefore, since in our case Xj — (cij+</>(j, k)) and q 3 = pff with E™=i Pif 



1, we obtain (p(i,k) — Yij>=iPij( c ij + 4>{h^)) ror i k> together with the 
boundary condition </>(fc, fc) = 0. But this is exactly the recurrence formula 
computing the average first-passage cost in an ergodic Markov chain [38, 56, 
62, 74]. Thus, when 9 -> 0+, = (</>(i,j) + <K?',i))/ 2 reduces to half the 
commute cost distance between i and j. 
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